This assignment is for ETC5521 Assignment 1 by Team wallaby comprising of Helen Evangelina and Rahul Bharadwaj.
Music, in a broad sense, is any art composed of sound, but it can express people’s thoughts and thoughts, which implies the author’s life experience, thoughts and feelings, and can bring people the enjoyment of beauty and the expression of human feelings. At the same time, music is also a form of social behavior, through which people can exchange feelings and life experiences.
In ancient times, when the court held a banquet, or some talented people visited the landscape, they would play music to boost the fun. But in modern times, because the threshold of classical music is too high, and its development has gradually reached the extreme, it has become a very small group, while pop music (the general name of popular songs, including Rock, R&B, Latin, etc) is gradually showing its own characteristics. Therefore, modern songs are quietly occupying the top position in people’s hearts because of their outstanding performance in conveying emotion and life experience. Listening to pop music has also become the most common behavior in everyone’s daily entertainment.
Nowadays, music plays an important role in people’s life. It plays an indispensable role in helping people manage and improve their quality of life. As fans of music, we not only enjoy music, but also wonder how music strikes people’s hearts with simple tones, rhythms, timbres and words. How high is the position of genre in music performance? How much influence does the genre, or the various attributes of songs, have on music? Where do we like music? Whether it makes us dance or sing unconsciously, or does it convey our emotions and implicate our thoughts? All these are the motivations that we continue to study. But now listening software has sprung up like mushrooms. After careful consideration, our group decided to select Spotify as the research object. First of all, let me introduce Spotify.
Spotify is a legitimate streaming music service platform, which has been supported by Warner Music, Sony, EMI and other major record companies around the world. Now it has more than 60 million users, and it is the world’s leading large-scale online streaming music playing platform.
Because Spotify contains a large number of users’ data, four users who are very interested in it, Charlie Thompson, Josiah parry, Donal Phipps, and Tom Wolff decided to make it easier for everyone to know their own preferences or the mainstream of most people’s listening to songs through spotify’s API, thus creating Spotifyr package. Also, it is the source of our group assignment data.
In addition to Spotify package, our data is also mixed with blog post data created by Kaylin Pavlik. Six main categories (EDM, Latin, pop, R&B, rap, rock) are used to classify 5000 songs. The combination of the two data has a great effect on the study of the popularity of pop music.
By doing this exploratory data analysis, we want to know:
Main Question: What audio features are capable of making an impact on the popularity of music artworks and contribute to the emergence of Top Songs?
Sub Questions:
Since 1957, what are the audio features of those top artists who make the most music artworks?
Explore our favorite artist - Coldplay’s works, e.g. how about the musical positiveness conveyed by their albums?
There are plenty of modern music genres nowadays, What unique style or charm can stand out and become the first choice of people?
The data of this report is part of the tidytuesday chanllenge, which comes from Spotify via the spotifyr package.
The variables in this dataset are X, track_id, track_name, track_artist, track_popularity, track_album_id, track_album_name, track_album_release_date, playlist_name, playlist_id, playlist_genre, playlist_subgenre, danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, duration_ms, time frame of collection is from 1957-01-01 to 2020-01-29.
Data collection methods: Spotifyr package can extract track audio characteristics or other related information from Spotify’s Web API in batches. For example, if you want to search for an artist, just type in his name, and all his albums or songs will be listed in seconds. Meanwhile, Spotifyr package will record the popularity metrics of all tracks or albums, so it is easy to understand the correlation between music popularity and music characteristics. Then, Jon Harmon and Neal Grantham extracted the Spotifr package and added the content of Kaylin Pavlik’s recent blogpost to divide the genre of nearly 5000 songs, thus generating the Tidytuesdayr package we need for this assignment.
We chose music works created by artists that can be found on Spotify from January 1, 1957 to January 29, 2020.
library(DT)
datatable(variables) variables <- tibble(“Variable” = c(“track_id”, “track_name”, “track_artist”, “track_popularity”, “track_album_id”, “track_album_name”, “track_album_release_date”, ), “Description” = c(“a”, “b”))
| " | variable | class |
|---|---|---|
| track_id | character | Song unique ID |
| track_name | character | Song Name |
| track_artist | character | Song Artist |
| track_popularity | double | Song Popularity (0-100) where higher is better |
| track_album_id | character | Album unique ID |
| track_album_name | character | Song album name |
| track_album_release_date | character | Date when album released |
| playlist_name | character | Name of playlist |
| playlist_id | character | Playlist ID |
| playlist_genre | character | Playlist genre |
| playlist_subgenre | character | Playlist subgenre |
| danceability | double | Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. |
| energy | double | Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. |
| key | double | The estimated overall key of the track. Integers map to pitches using standard Pitch Class notation . E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1. |
| loudness | double | The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db. |
| mode | double | Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0. |
| speechiness | double | Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. |
| acousticness | double | A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. |
| instrumentalness | double | Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. |
| liveness | double | Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. |
| valence | double | A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). |
| tempo | double | The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. |
| duration_ms | double | Duration of song in milliseconds |
variables %>% datatable(filter = ‘top’, rownames = FALSE, options = list(pageLength = 5))
Now, we will clean the data, select the variables that are useful to our EDA, and retain six major music genres (the proportions of other genres are very low, which can be ignored). And then, we arrange the data from high to low according to track popularity.
From the following table and figure, we can see that Queen, Martin Garrix and the Chainmakers occupy one, two and three places respectively. Also, we can see that there are many famous artists on the list, such as Drake, Maroon 5 or Ed Sheeran, etc.
Similarly, this is a plot of artists with most songs showed in the bar plot. Our group decided to use two different forms to express, one is through the comparison of words(using datatable), the other is through the observation of intuitive figure. This will help to deepen our impression of the top 20 singers and have an intuitive understanding of the gap between them.
Top 20 Artists who wrote the most songs from 1941 to 2020
Next is a radar plot. Our group filters artists whose popularity is greater than 95, and then load the data into this type of plot. In this way, the singers who are at the top or most people like can be clear at a glance, and at the same time, music lovers can know the characteristics of these top singers’ music artworks.
First, we can see that Maroon 5, the Weekend, Roddy Rich and KAROL G are overwhelming in popularity. Because the size of each pie chart means the level of popularity. Also, it is clear that popular singers usually create many genres of songs, which are not limited to a single genre. Next, from the perspective of different artists’ music artworks style, there are filled with the great differences.
For example, from the brightness of colors, we can see that the Energy brought by Maroon 5 and Billie Eilish’s music artworks is not too high. This is not to elaborate their shortcomings, but to elaborate their style, which is lyrical and soft. If judging from the color of each fan-shaped boundary line, it can be concluded that Roddy Rich and Trevor Daniel’s works have the highest value of danceability, after the comparison of each artworks’ average tempo, rhythm stability, beat strength, and overall regularity.
Characteristics of top singers
In this part, we want to take one artist for example to do some detailed exploratory analysis using the “spotifyr” package. Here we choose the Coldplay, our favorate artist.
First, we loaded all the albums of Coldplay available on spotify and droped the duplicate ones (some live tour albums are duplicate with the existed ones). We calculated the average valence of each album. The results are shown in the following table. According to the spotify tracks documentation, The valence variable is measured from 0.0 to 1.0, describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). The highest valence of these albums is 0.3, and the lowest valence is 0.18, which means the songs of Coldplay usually sounds more negative than positive for the audience.
| album_name | valence |
|---|---|
| Everyday Life | 0.30 |
| Viva La Vida or Death and All His Friends | 0.26 |
| Mylo Xyloto | 0.25 |
| Parachutes | 0.23 |
| A Head Full of Dreams | 0.23 |
| X&Y | 0.22 |
| Ghost Stories | 0.21 |
| Love in Tokyo | 0.19 |
| A Rush of Blood to the Head | 0.18 |
Second, we make a density plot to show the ranges and densities of valence of each album. From the following figure, we can find that “Everyday Life” has the widest range of valence, that is to say, this album contains abundant emotions. Meanwhile, “A Rush of Blood to the Head” has a narrow range of valence, and the valence density centered at the area with lower valence values. It’s probably that the audience would feel negative emotions like sad, depressed and angry when they listening to this album. This finding surprised us because the “A Rush of Blood to the Head” is the second best album in “The Coldplay Albums Ranked”. So we decided to look more in depth next.
Lastly, we analysed the sentiment of this album to see whether the valence of an album is associated with the lyrics. The average sentiment value of this album is -0.47 by the “afinn” lexicon. And we also analysed the sentiment of lyrics using the “bing” lexicon. The following table shows the most frequent words and their sentiment in this album. In addition, the figure below shows more intuitively the frequency of words which appears more than once. We can easily find that the negative words appear more than the positive ones.
As a result, we can say for sure that, both in terms of sound and lyrics, this album conveyed negative emotions. But this doesn’t affect that people think “A Rush of Blood to the Head” is one of the best albums of the Coldplay. It can be seen that the audience’s love for a album is not entirely determined by the album’s positiveness.
| word | sentiment | n |
|---|---|---|
| love | positive | 7 |
| easy | positive | 4 |
| fall | negative | 4 |
| grace | positive | 4 |
| miss | negative | 4 |
In this part, we analysed the audio features of all the songs in our dataset. The figure below shows how these features like in different genres. Here’s a simple explanation of these features:
acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic.
danceablity: Danceability describes how suitable a track is for dancing. A value of 0.0 is least danceable and 1.0 is most danceable.
duration_ms: The duration of the track in milliseconds. (And duration_s in seconds, rounded.)
energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
instrumentalness: Predicts whether a track contains no vocals.
key: The key the track is in.
liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.
loudness: The overall loudness of a track in decibels (dB).
mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
speechness: Speechiness detects the presence of spoken words in a track.
tempo: The overall estimated tempo of a track in beats per minute (BPM).
valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
The next three box plots are to find out the differences of music attributes between different Music Genres. Firstly, the relationship between color and Music Genre is established, and put into the same tibble, call “COLORS”. This method allows different Music Genre can be clearly distinguished by different colors, and then the specific characteristic of each Music Genre can be judged from those box plots.
The first plot is the relationship between Music Genre and Valence. It can be clearly seen from the plot that Latin has the highest value of Valence and EDM has the lowest value of Valence. This shows that Latin’s capacity of conveying the musical position is more powerful, while EDM sounds more negative. The other four Music Genre have no obvious trend in this respect, which are almost between 0.3 and 0.7.
Average valence by Music Genre
The second plot describes the relationship between Music Genre and Energy. Energy is a measure from 0.0 to 1.0 and representatives a conceptual measure of intensity and activity. It can be clearly seen from the plot that EDM has the highest value of Energy, while R B’s value of Energy is the lowest, which also shows the style of these two Music Genres. Mostly, EDM will make people feel fast, loud, and noisy when listening. However, R&B is mainly lyrical, slow and quiet, which bring less energy for the listeners. Similarly, Rock has always been famous for its flexible and bold expression and passionate music rhythm, and its ranking is only inferior to EDM.
Average Energy by Music Genre
Finally, this plot describes the relationship between Music Genres and Speechlesness. Speechlessness detects the presence of spoken words in a track. If more words or sentences are said in a song, the closer to 1.0 the attribute value. That attribute is very interesting, which indicates whether the artists tends to express ideas by describing the lyrics in music or writing the melody of music to express their feelings.
Focus on the plot, it is no doubt that Rap is bound to occupy the first place, because the characteristic of Rap is to quickly tell a series of rhyming lyrics against the background of mechanical rhythmic sound. What is worth noting is that Rock and POP are the lowest, which shows that those two genres tend to use the melody or rhythm of music to affect the audience, rather than using the lyrics.
Average speechiness by Music Genre
After describing the contents and internal relations of the three plots in detail, there are still many related attributes that have not been explored. The purpose of our group is to put up the most interesting parts. If someone is interested, it is easy to continue to analyse.
After reviewing the internal relations between audio features and Music Genres, now we can discuss about the Music Genres in detail. The table below shows the distribution of each genre in this dataset. The most frequently appeared genre is “edm”, while the genre “rock” appeared least.
| playlist_genre | n |
|---|---|
| edm | 6043 |
| rap | 5746 |
| pop | 5507 |
| r&b | 5431 |
| latin | 5155 |
| rock | 4951 |
The following figure shows the average popularity of songs released in different time. To show the result clearly and for convenience of comparison, we divided the result for each genre. (1) The edm music has been popular since the 1970s, and the popularity of edm music released in the past 50 years are 40 or even less. This means the edm music is not the mainstream music type nowadays. (2) The latin and pop music have been popular since the 1960s. The 1970s was the golden time for the latin songs, while the 1960s and 1970s were the golden time for the pop music. These old songs are still popular now. (3) The r&b music went through ups and downs. The songs released from the 1980s to the 2000s are less popular than others. (4) The rap music has been popular since the 1960s, and the oldest rap music is still the most popular ones. And the songs released in the 2000s have the lowest popularity now. (5) The popularity of rock music released in different time period are quite stable. While the ones released from the 1960s to the 1990s are more popular than the others.
The correlation of song features is very helpful for us to explore the reasons for the popularity of music artworks. We can see from the correlation plot that the characteristics of each song are specific and unique, but we can summarize them with ten musical attributes. Meanwhile, there are three types of relation between different attributes: Negative correlation, positive correlation or completely irrelevant. This is very important for us to analyze the properties of music artworks in the future.
For example, if a song has a strong energy attribute, it must also have a high value of loudness, and the probability of not belonging to acoustic is also very high. If a person like songs that are more active or have higher valence, he should explore his some potential favorite songs of high danceability, high energy, and contains more vocal content. It is easy to see that the role of correlation plot is very meaningful. It can play an irreplaceable role in the analysis of songs or the selection of the favorite attributes of songs. And the rest of effects can be explored later.
After describing the unique information about audio features, now we pay attention to exploring whether these audio features contribute to a higher popularity. First we plotted each audio feature of the songs and the popularity in the following figure to observe. It shows that liveness has a negative relationship with popularity and we also find that there’s no absolute relationship between valence and popularity. A higher valence doesn’t necessarily make a song more popular.This is consistent with our sentiment analysis.
## # A tibble: 43 x 3
## # Groups: genre [6]
## genre decade mean_popularity
## <chr> <chr> <dbl>
## 1 edm 1970 24
## 2 edm 1980 37.4
## 3 edm 1990 39.3
## 4 edm 2000 20.7
## 5 edm 2010 35.1
## 6 edm 2020 40.3
## 7 latin 1960 26
## 8 latin 1970 63.2
## 9 latin 1980 39.9
## 10 latin 1990 39.8
## # ... with 33 more rows
Also, We are not sure whether those above dot plots can directly reveal the relationship between these popularity and audio features. So we pay attention to exploring whether these audio features contribute to a higher popularity using a linear regression model just in case. Here we filtered the songs with a popularity greater than 0, since 0 popularity value does not make sense in this model. And the following table shows all the audio features with a p-value less than 0.05. We can draw a conclusion that danceability and valence contribute most to a higher popularity. Acousticness, key, loudness, mode and tempo also have positive relationship with popularity. While energy, instrumentalness, liveness and speechiness have negative relationship with popularity, with is similar with those dot plots conclusion.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 744.18 | 15.80 | 47.10 | 0.00 |
| acousticness | 18.36 | 6.85 | 2.68 | 0.01 |
| danceability | 35.67 | 9.91 | 3.60 | 0.00 |
| duration_ms | 0.00 | 0.00 | -14.03 | 0.00 |
| energy | -253.44 | 11.32 | -22.39 | 0.00 |
| instrumentalness | -109.25 | 6.07 | -18.01 | 0.00 |
| key | 0.95 | 0.35 | 2.69 | 0.01 |
| liveness | -23.39 | 8.47 | -2.76 | 0.01 |
| loudness | 14.09 | 0.61 | 23.07 | 0.00 |
| mode | 6.61 | 2.58 | 2.57 | 0.01 |
| speechiness | -43.00 | 12.83 | -3.35 | 0.00 |
| tempo | 0.18 | 0.05 | 3.72 | 0.00 |
| valence | 37.22 | 6.08 | 6.12 | 0.00 |
Explore the music characteristics over time. How is it changing? And then explore top 5 artists (according to track_popularity) in terms of the music characteristics over time – is the music characteristic for one artist changing over time?
(put here or on the introduction?) how they expand on what has been done already, why these would be interesting to pursue, and how it broadens the scope of the original analysis.
It has been previously discussed about the different music components and the correlation between each, and also track_popularity. Now, another thing that we can look at is the trend of the music components overtime. Along time, more musical instruments and more genres are being introduced, changing our music taste. Therefore, analysing the trend of music components over time is an interesting thing to look at as it would be beneficial to understand how the characteristics of music are changing. As music evolves, we want to look at how the characteristics evolve. Are the music characteristics in 1957 similar to those in 2019? The trend of music genres has been analysed before, which shows that the trend is changing. Therefore, looking at the trend of music components would broaden the analysis more to see if the components are also changing.
The trend of music components over the years.
To answer the research question “How is the music characteristic over time and how is it changing?”, we plot the values of the different music components against year, and faceted according to the components. We can see from Figure @ref(fig:components-trend) that there are some components which are changing over time.
Acousticness, mode, tempo and valence tend to decrease over time, while danceability and energy tend to increase. The decreasing valence indicates that new songs tend to be sadder as more and more unhappy songs are being released every year. “Happiness and brightness in music has declined, while sadness increased in the last 30 years or so” (https://entertainment.inquirer.net/274757/people-prefer-happy-music-sad-songs-trend-past-30-years-study). Researches have suggested that the usage of positive emotions has declined. However, there is a high variability in valence values which means that there are varying types of songs. The increasing danceability and energy is due to the rising of electronic music.
Other music components tend to follow a steady trend overtime. However, looking at instrumentalness, speechiness, liveness, and loudness, there has been more variances along time. Loudness tend to range between -20 to 0 decibels, except for some outliers present where the loudness is really low.
The emergence of more modern music with high energy and
The music characteristics might change because of the emergence of new types and more modern music. But is the music characteristic also changing for an artist? It would be interesting to look at the trend to see if one specific artist also has a shift in his music characteristics. Is the same artist making the same kind of music through time?
To answer this question, we are going to look at the top five artists and look at their music characteristics trend over time. We are not using the track_popularity to determine the top five artists here as some artists that are in the top five only have songs in a certain year, thus we are not able to compare the music characteristics over time. Therefore, we are using number of track_name instead. The following table display the top five aartists:
| track_artist | Total |
|---|---|
| Queen | 111 |
| Martin Garrix | 73 |
| David Guetta | 64 |
| Logic | 62 |
| Hardwell | 61 |
Unlike the other four artists, Queen has a different music timeline. Therefore, for the purpose of having a clearer visualisation, we split Queen with the others.
The musical components of the top 5 artists (excluding Queen) over time.
The musical components of the top 5 artists (excluding Queen) over time.
It can be seen from Figure @ref(fig:top4-components) that music components of the individual artist shift over time. Some components are pretty volatile. David Guetta, who (berkarir) in both pop and edm, has a huge change in his danceability and energy. He tends to produce more music with lower danceability but higher energy these days. His songs also have lower duration compared to those in the past. The valence of his songs change overtime too - with 2010 and 2011 being the years when he produced more positive songs, and his songs tend to get less positive over the years. Hardwell focuses on edm, which explains the low speechiness. His instrumentalness used to be pretty high in 2013, however it is shifting towards zero over time. Unlike David Guetta, Hardwell tends to produce more positive songs these days.
Being a rapper, it makes sense that Logic has a very close to zero instrumentalness. The trend of key component in his music follows a decreasing trend over time. The valence, acousticness, and liveness of his songs are pretty volatile over time. One inetersting thing about Hardwell is that his danceability and tempo were decreasing in around 2016, indicating that he produced less upbeat and slower tempo music in this year. After 2016, his danceability and tempo are increasing again. Martin Garrix is an edm artist. He has a decreasing energy, loudness and duration over time. In 2013 and 2014, he produced music with a relatively high instrumentalness, however it dropped down across the years. Similar to Hardwell, Martin Garric creates more positive songs along the years - indicated by the increasing valence.
A clear thing to notice here is that all of them have a relatively low speechiness as they are all in either edm or rap genre. The most volatile component of all is mode.
The musical components of Queen over time.
Figure @ref(fig:queen-trend) is showing the music components of Queen’s songs over time. And here we can see that the music characteristics evolve overtime. Queen is a rock band, hence explains the relatively high energy level. A very interesting thing here is that the danceability, energy, liveness, loudness, and speechiness are all dropping in 1992, while the acousticness is increasing sharply compared to the years before. This is due to Queen only produced one song in 1992 - the popular “We Are The Champions”. This song is pretty different compared to other songs Queen produced as this song has less energy and less danceable.
It can be concluded that the music characteristics evolve over time. And despite the changes, each artist has their own uniqueness in terms of the music they produce.
i wanna look at the relo between track artist and their musical characteristics(x as artist, y as characteristics?)
can also look at relationship between year (decade) and characteristics and make scatter plot matrix coloured by artists.
a <- top5artists %>% mutate(decade = round(as.numeric(year) - 4.5, -1)) %>% pivot_longer(danceability:valence, names_to = “characteristics”, values_to = “values”) a\(year <- as.Date(a\)year, format = “%Y”)
aa <- characteristics_topartists %>% group_by(year, track_artist)
ggplot() + geom_line(data = aa, aes(x = year, y = mean, color = track_artist)) + geom_point(data = a, aes(x = year, y = values, color = track_artist)) + facet_wrap(~characteristics, scales = “free”)
the interesting one here is valence - so we look at it closer.
TO see if the characteristics are changing over time, can look at the correlation between the char and year.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 2011.04 | 0.08 | 24943.56 | 0 |
##
## Call:
## lm(formula = year ~ acousticness, data = spotify_songs)
##
## Coefficients:
## (Intercept) acousticness
## 2011.0434 0.5351
##
## Call:
## lm(formula = year ~ danceability, data = spotify_songs)
##
## Coefficients:
## (Intercept) danceability
## 2002.8 12.7
After Exploratory Data Analysis, our group got the answers to those questions. First of all, there is a positive or negative correlation between audio features and track popularity. However, as we all know, the value of a art work can’t be measured only by numbers. The popularity of music artworks depends more on the artist’s own popularity, creative talent or singing ability, or external factors such as world trends. The probability of success by deliberately catering to audio features and creating specific songs is not sufficient.
Secondly, each top artist has its own artistic characteristics, and will be loved by specific groups of people. Top artists do not create music artworks according to the trend, instead, they will create their own trend for the world.
As for the six kinds of music genres that can stand out from the modern music, there are also their own characteristics inside. It’s hard to understand the reasons for their success because of their unique styles. What we can do is to determine the genre of each song according to its style.
Finally, Although Coldplay, as one of the representative rock artist, their works contain more negative emotions. This is also in line with the rebellious and critical spirit of rock music, and this spirit has been respected by young people of different races all the time. They stick to their own style, try unconventional music routines as far as possible, and point to people’s hearts with straightforward, profound and moving melody. This also confirms our analysis that Coldplay songs’ lyrics convey negative emotions, which does not affect their popularity, but makes them top artists. In conclusion, track popularity will pay more attention to the singer’s own ability and attitude, rather than audio features. The biggest role of audio features is to reflect the singer’s music style, rather than increase popularity.
The R packages we used in this report: Wickham (2016), Waring et al. (2020), Wei and Simko (2017), Arnold (2019), Xie, Cheng, and Tan (2020), Wickham, Hester, and Chang (2020), Wickham, Hester, and Francois (2018), Wickham et al. (2019), Wickham et al. (2020), Grolemund and Wickham (2011), Xie (2020), Zhu (2019), Robinson, Hayes, and Couch (2020), Auguie (2017), Parry and Barr (2020), Silge and Robinson (2016), Hvitfeldt (2020), Thompson (2017), Wilke (2020) .
Arnold, Jeffrey B. 2019. Ggthemes: Extra Themes, Scales and Geoms for ’Ggplot2’. https://CRAN.R-project.org/package=ggthemes.
Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
Hvitfeldt, Emil. 2020. Textdata: Download and Load Various Text Datasets. https://CRAN.R-project.org/package=textdata.
Parry, Josiah, and Nathan Barr. 2020. Genius: Easily Access Song Lyrics from Genius.com. https://CRAN.R-project.org/package=genius.
Robinson, David, Alex Hayes, and Simon Couch. 2020. Broom: Convert Statistical Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom.
Silge, Julia, and David Robinson. 2016. “Tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” JOSS 1 (3). https://doi.org/10.21105/joss.00037.
Thompson, Charlie. 2017. “Spotifyr: R Wrapper for the’Spotify’Web Api.” https://github.com/charlie86/spotifyr.
Waring, Elin, Michael Quinn, Amelia McNamara, Eduardo Arino de la Rubia, Hao Zhu, and Shannon Ellis. 2020. Skimr: Compact and Flexible Summaries of Data. https://CRAN.R-project.org/package=skimr.
Wei, Taiyun, and Viliam Simko. 2017. R Package "Corrplot": Visualization of a Correlation Matrix. https://github.com/taiyun/corrplot.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Jim Hester, and Winston Chang. 2020. Devtools: Tools to Make Developing R Packages Easier. https://CRAN.R-project.org/package=devtools.
Wickham, Hadley, Jim Hester, and Romain Francois. 2018. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wilke, Claus O. 2020. Ggridges: Ridgeline Plots in ’Ggplot2’. https://CRAN.R-project.org/package=ggridges.
Xie, Yihui. 2020. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.
Xie, Yihui, Joe Cheng, and Xianying Tan. 2020. DT: A Wrapper of the Javascript Library ’Datatables’. https://CRAN.R-project.org/package=DT.
Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.